- Causal Inference
- Overview
- Causal inference determines cause-and-effect relationships and estimates treatment effects, going beyond correlation to understand what causes what.
- When to Use
- Evaluating the impact of policy interventions or business decisions
- Estimating treatment effects when randomized experiments aren't feasible
- Controlling for confounding variables in observational data
- Determining if a marketing campaign or product change caused an outcome
- Analyzing heterogeneous treatment effects across different user segments
- Making causal claims from non-experimental data using propensity scores or instrumental variables
- Key Concepts
- Treatment
-
- Intervention or exposure
- Outcome
-
- Result or consequence
- Confounding
-
- Variables affecting both treatment and outcome
- Causal Graph
-
- Visual representation of relationships
- Treatment Effect
-
- Impact of intervention
- Selection Bias
-
- Non-random treatment assignment
- Causal Methods
- Randomized Controlled Trials (RCT)
-
- Gold standard
- Propensity Score Matching
-
- Balance treatment/control
- Difference-in-Differences
-
- Before/after comparison
- Instrumental Variables
-
- Handle endogeneity
- Causal Forests
- Heterogeneous treatment effects Implementation with Python import pandas as pd import numpy as np import matplotlib . pyplot as plt import seaborn as sns from sklearn . linear_model import LinearRegression , LogisticRegression from sklearn . preprocessing import StandardScaler from scipy import stats
Generate observational data with confounding
np . random . seed ( 42 ) n = 1000
Confounder: Age (affects both treatment and outcome)
age
np . random . uniform ( 25 , 75 , n )
Treatment: Training program (more likely for younger people)
treatment_prob
0.3 + 0.3 * ( 75 - age ) / 50
Inverse relationship with age
treatment
( np . random . uniform ( 0 , 1 , n ) < treatment_prob ) . astype ( int )
Outcome: Salary (affected by both treatment and age)
True causal effect of treatment: +$5000
salary
40000 + 500 * age + 5000 * treatment + np . random . normal ( 0 , 10000 , n ) df = pd . DataFrame ( { 'age' : age , 'treatment' : treatment , 'salary' : salary , } ) print ( "Observational Data Summary:" ) print ( df . describe ( ) ) print ( f"\nTreatment Rate: { df [ 'treatment' ] . mean ( ) : .1% } " ) print ( f"Average Salary (Control): $ { df [ df [ 'treatment' ] == 0 ] [ 'salary' ] . mean ( ) : .0f } " ) print ( f"Average Salary (Treatment): $ { df [ df [ 'treatment' ] == 1 ] [ 'salary' ] . mean ( ) : .0f } " )
1. Naive Comparison (BIASED - ignores confounding)
naive_effect
df [ df [ 'treatment' ] == 1 ] [ 'salary' ] . mean ( ) - df [ df [ 'treatment' ] == 0 ] [ 'salary' ] . mean ( ) print ( f"\n1. Naive Comparison: $ { naive_effect : .0f } (BIASED)" )
2. Regression Adjustment (Covariate Adjustment)
X
df [ [ 'treatment' , 'age' ] ] y = df [ 'salary' ] model = LinearRegression ( ) model . fit ( X , y ) regression_effect = model . coef_ [ 0 ] print ( f"\n2. Regression Adjustment: $ { regression_effect : .0f } " )
3. Propensity Score Matching
Estimate probability of treatment given covariates
ps_model
LogisticRegression ( ) ps_model . fit ( df [ [ 'age' ] ] , df [ 'treatment' ] ) df [ 'propensity_score' ] = ps_model . predict_proba ( df [ [ 'age' ] ] ) [ : , 1 ] print ( f"\n3. Propensity Score Matching:" ) print ( f"PS range: [ { df [ 'propensity_score' ] . min ( ) : .3f } , { df [ 'propensity_score' ] . max ( ) : .3f } ]" )
Matching: find control for each treated unit
matched_pairs
[ ] treated_units = df [ df [ 'treatment' ] == 1 ] . index for treated_idx in treated_units : treated_ps = df . loc [ treated_idx , 'propensity_score' ] treated_age = df . loc [ treated_idx , 'age' ]
Find closest control unit
control_units
df [ ( df [ 'treatment' ] == 0 ) & ( df [ 'propensity_score' ]
= treated_ps - 0.1 ) & ( df [ 'propensity_score' ] <= treated_ps + 0.1 ) ] . index if len ( control_units )
0 : closest_control = min ( control_units , key = lambda x : abs ( df . loc [ x , 'propensity_score' ] - treated_ps ) ) matched_pairs . append ( { 'treated_idx' : treated_idx , 'control_idx' : closest_control , 'treated_salary' : df . loc [ treated_idx , 'salary' ] , 'control_salary' : df . loc [ closest_control , 'salary' ] , } ) matched_df = pd . DataFrame ( matched_pairs ) psm_effect = ( matched_df [ 'treated_salary' ] - matched_df [ 'control_salary' ] ) . mean ( ) print ( f"PSM Effect: $ { psm_effect : .0f } " ) print ( f"Matched pairs: { len ( matched_df ) } " )
4. Stratification by Propensity Score
df [ 'ps_stratum' ] = pd . qcut ( df [ 'propensity_score' ] , q = 5 , labels = False , duplicates = 'drop' ) stratified_effects = [ ] for stratum in df [ 'ps_stratum' ] . unique ( ) : stratum_data = df [ df [ 'ps_stratum' ] == stratum ] if ( stratum_data [ 'treatment' ] == 0 ) . sum ( )
0 and ( stratum_data [ 'treatment' ] == 1 ) . sum ( )
0 : treated_mean = stratum_data [ stratum_data [ 'treatment' ] == 1 ] [ 'salary' ] . mean ( ) control_mean = stratum_data [ stratum_data [ 'treatment' ] == 0 ] [ 'salary' ] . mean ( ) effect = treated_mean - control_mean stratified_effects . append ( effect ) stratified_effect = np . mean ( stratified_effects ) print ( f"\n4. Stratification by PS: $ { stratified_effect : .0f } " )
5. Visualization
fig , axes = plt . subplots ( 2 , 2 , figsize = ( 14 , 10 ) )
Treatment distribution by age
ax
axes [ 0 , 0 ] treated = df [ df [ 'treatment' ] == 1 ] control = df [ df [ 'treatment' ] == 0 ] ax . hist ( control [ 'age' ] , bins = 20 , alpha = 0.6 , label = 'Control' , color = 'blue' ) ax . hist ( treated [ 'age' ] , bins = 20 , alpha = 0.6 , label = 'Treated' , color = 'red' ) ax . set_xlabel ( 'Age' ) ax . set_ylabel ( 'Frequency' ) ax . set_title ( 'Age Distribution by Treatment' ) ax . legend ( ) ax . grid ( True , alpha = 0.3 , axis = 'y' )
Salary vs Age (colored by treatment)
ax
axes [ 0 , 1 ] ax . scatter ( control [ 'age' ] , control [ 'salary' ] , alpha = 0.5 , label = 'Control' , s = 30 ) ax . scatter ( treated [ 'age' ] , treated [ 'salary' ] , alpha = 0.5 , label = 'Treated' , s = 30 , color = 'red' ) ax . set_xlabel ( 'Age' ) ax . set_ylabel ( 'Salary' ) ax . set_title ( 'Salary vs Age by Treatment' ) ax . legend ( ) ax . grid ( True , alpha = 0.3 )
Propensity Score Distribution
ax
axes [ 1 , 0 ] ax . hist ( df [ df [ 'treatment' ] == 0 ] [ 'propensity_score' ] , bins = 20 , alpha = 0.6 , label = 'Control' , color = 'blue' ) ax . hist ( df [ df [ 'treatment' ] == 1 ] [ 'propensity_score' ] , bins = 20 , alpha = 0.6 , label = 'Treated' , color = 'red' ) ax . set_xlabel ( 'Propensity Score' ) ax . set_ylabel ( 'Frequency' ) ax . set_title ( 'Propensity Score Distribution' ) ax . legend ( ) ax . grid ( True , alpha = 0.3 , axis = 'y' )
Treatment Effect Comparison
ax
axes [ 1 , 1 ] methods = [ 'Naive' , 'Regression' , 'PSM' , 'Stratified' ] effects = [ naive_effect , regression_effect , psm_effect , stratified_effect ] true_effect = 5000 ax . bar ( methods , effects , color = [ 'red' , 'orange' , 'yellow' , 'lightgreen' ] , alpha = 0.7 , edgecolor = 'black' ) ax . axhline ( y = true_effect , color = 'green' , linestyle = '--' , linewidth = 2 , label = f'True Effect ($ { true_effect : .0f } )' ) ax . set_ylabel ( 'Treatment Effect ($)' ) ax . set_title ( 'Treatment Effect Estimates by Method' ) ax . legend ( ) ax . grid ( True , alpha = 0.3 , axis = 'y' ) for i , effect in enumerate ( effects ) : ax . text ( i , effect + 200 , f'$ { effect : .0f } ' , ha = 'center' , va = 'bottom' ) plt . tight_layout ( ) plt . show ( )
6. Doubly Robust Estimation
from sklearn . ensemble import RandomForestRegressor
Propensity score model
ps_model_dr
LogisticRegression ( ) . fit ( df [ [ 'age' ] ] , df [ 'treatment' ] ) ps_scores = ps_model_dr . predict_proba ( df [ [ 'age' ] ] ) [ : , 1 ]
Outcome model
outcome_model
RandomForestRegressor ( n_estimators = 50 , random_state = 42 ) outcome_model . fit ( df [ [ 'treatment' , 'age' ] ] , df [ 'salary' ] )
Doubly robust estimator
treated_mask
df [ 'treatment' ] == 1 control_mask = df [ 'treatment' ] == 0
Adjust for propensity score
treated_adjusted
( treated_mask . astype ( int ) * df [ 'salary' ] ) / ( ps_scores + 0.01 ) control_adjusted = ( control_mask . astype ( int ) * df [ 'salary' ] ) / ( 1 - ps_scores + 0.01 )
Outcome predictions
pred_treated
outcome_model . predict ( df [ [ 'treatment' , 'age' ] ] . replace ( { 'treatment' : 0 , 1 : 1 } ) ) pred_control = outcome_model . predict ( df [ [ 'treatment' , 'age' ] ] . replace ( { 'treatment' : 1 , 0 : 0 } ) ) dr_effect = treated_adjusted . sum ( ) / treated_mask . sum ( ) - control_adjusted . sum ( ) / control_mask . sum ( ) print ( f"\n6. Doubly Robust Estimation: $ { dr_effect : .0f } " )
7. Heterogeneous Treatment Effects
- (
- f"\n7. Heterogeneous Treatment Effects (by Age Quartile):"
- )
- for
- age_q
- in
- pd
- .
- qcut
- (
- df
- [
- 'age'
- ]
- ,
- q
- =
- 4
- ,
- duplicates
- =
- 'drop'
- )
- .
- unique
- (
- )
- :
- mask
- =
- (
- df
- [
- 'age'
- ]
- >=
- age_q
- .
- left
- )
- &
- (
- df
- [
- 'age'
- ]
- <
- age_q
- .
- right
- )
- stratum_data
- =
- df
- [
- mask
- ]
- if
- (
- stratum_data
- [
- 'treatment'
- ]
- ==
- 0
- )
- .
- sum
- (
- )
- >
- 0
- and
- (
- stratum_data
- [
- 'treatment'
- ]
- ==
- 1
- )
- .
- sum
- (
- )
- >
- 0
- :
- treated_mean
- =
- stratum_data
- [
- stratum_data
- [
- 'treatment'
- ]
- ==
- 1
- ]
- [
- 'salary'
- ]
- .
- mean
- (
- )
- control_mean
- =
- stratum_data
- [
- stratum_data
- [
- 'treatment'
- ]
- ==
- 0
- ]
- [
- 'salary'
- ]
- .
- mean
- (
- )
- effect
- =
- treated_mean
- -
- control_mean
- (
- f" Age
- {
- age_q
- .
- left
- :
- .0f
- }
- -
- {
- age_q
- .
- right
- :
- .0f
- }
- $ { effect : .0f } " )
8. Sensitivity Analysis
print ( f"\n8. Sensitivity Analysis (Hidden Confounder Impact):" )
Vary hidden confounder correlation with outcome
- for
- hidden_effect
- in
- [
- 1000
- ,
- 2000
- ,
- 5000
- ,
- 10000
- ]
- :
- adjusted_effect
- =
- regression_effect
- -
- hidden_effect
- *
- 0.1
- (
- f" If hidden confounder worth $
- {
- hidden_effect
- }
- Effect = $ { adjusted_effect : .0f } " )
9. Summary Table
print ( f"\n" + "=" * 60 ) print ( "CAUSAL INFERENCE SUMMARY" ) print ( "=" * 60 ) print ( f"True Treatment Effect: $ { true_effect : ,.0f } " ) print ( f"\nEstimates:" ) print ( f" Naive (BIASED): $ { naive_effect : ,.0f } " ) print ( f" Regression Adjustment: $ { regression_effect : ,.0f } " ) print ( f" Propensity Score Matching: $ { psm_effect : ,.0f } " ) print ( f" Stratification: $ { stratified_effect : ,.0f } " ) print ( f" Doubly Robust: $ { dr_effect : ,.0f } " ) print ( "=" * 60 )
10. Causal Graph (Text representation)
- (
- f"\n10. Causal Graph (DAG):"
- )
- (
- f"""
- Age → Treatment ← (Selection Bias)
- ↓ ↓
- └─→ Salary
- Interpretation:
- - Age is a confounder
- - Treatment causally affects Salary
- - Age directly affects Salary
- - Age affects probability of Treatment
- """
- )
- Causal Assumptions
- Unconfoundedness
-
- No unmeasured confounders
- Overlap
-
- Common support on propensity scores
- SUTVA
-
- No interference between units
- Consistency
-
- Single version of treatment
- Treatment Effect Types
- ATE
-
- Average Treatment Effect (overall)
- ATT
-
- Average Treatment on Treated
- CATE
-
- Conditional Average Treatment Effect
- HTE
-
- Heterogeneous Treatment Effects
- Method Strengths
- RCT
-
- Gold standard, controls all confounders
- Matching
-
- Balances groups, preserves overlap
- Regression
-
- Adjusts for covariates
- Instrumental Variables
-
- Handles endogeneity
- Causal Forests
- Learns heterogeneous effects Deliverables Causal graph visualization Treatment effect estimates Sensitivity analysis Heterogeneous treatment effects Covariate balance assessment Propensity score diagnostics Final causal inference report